A Bayesian Reanalysis of the Overall and Sex-Disaggregated Results of the Neonatal Oxygenation Prospective Meta-Analysis (NeOProM)

Data from the Neonatal Oxygenation Prospective Meta-analysis (NeOProM) indicate that targeting a higher (91–95%) versus lower (85–89%) pulse oximeter saturation (SpO2) range may reduce mortality and necrotizing enterocolitis (NEC) and increase retinopathy of prematurity (ROP). Aiming to re-evaluate the strength of this evidence, we conducted a Bayesian reanalysis of the NeOProM data. We used Bayes factors (BFs) to evaluate the likelihood of the data under the combination of models assuming the presence vs. absence of effect, heterogeneity, and moderation by sex. The Bayesian reanalysis showed moderate evidence in favor of no differences between SpO2 targets (BF10 = 0.30) in death or major disability, but moderate evidence (BF10 = 3.60) in favor of a lower mortality in the higher SpO2 group. Evidence in favor of differences was observed for bronchopulmonary dysplasia (BPD) (BF10 = 14.44, lower rate with lower SpO2), severe NEC (BF10 = 9.94), and treated ROP (BF10 = 3.36). The only outcome with moderate evidence in favor of sex differences was BPD. This reanalysis of the NeOProM trials confirmed that exposure to a lower versus higher SpO2 range is associated with a higher mortality and risk of NEC, but a lower risk of ROP and BPD. The Bayesian approach can help in assessing the strength of evidence supporting clinical decisions.


Introduction
The Oxygen Paradox states that while oxygen is essential for aerobic life forms, it is also inherently dangerous to those same life forms [1].Arguably, the moment of life in which this paradox manifests itself in a more pronounced way is the transition from intrauterine to the extrauterine life.At birth, a newborn is exposed to the oxidative shock of transitioning from the relative hypoxia of fetal life to the atmospheric normoxia of postnatal life [2][3][4].While this transition is generally smooth in term infants, because they have adequate antioxidant defenses, this is not the case in premature infants.Furthermore, the physiologically hypoxic intrauterine environment is a major stimulus for the development of organs and systems [5,6].Preterm birth disrupts this physiological development, forcing immature organs and systems to assume their physiological functions too early.This alters the type of signals and stimuli that these organs and systems will receive for their subsequent development.Two other aspects need to be taken into account.The first is that an environment of oxidative stress may already have been induced by the pathological condition, or endotype, responsible for preterm birth [7].The second is that therapeutic interventions, such as oxygen supplementation, mechanical ventilation, or parenteral nutrition, together with postnatal exposure to infectious inflammatory processes, may

Materials and Methods
This study was exempt from obtaining formal institutional review board approval and from the requirement to obtain informed patient consent because it is secondary research of a publicly available data set [11].
The sex-specific data from each of the five studies included in the NeOProM were reentered into a new database and the values of log risk ratio (logRR) and the corresponding standard error and 95% confidence interval (CI) of each individual study were calculated using COMPREHENSIVE META-ANALYSIS V4.0 software (Biostat Inc., Englewood, NJ, USA).The results were further pooled and analyzed by a Bayesian-model-averaged (BMA) meta-regression [45], a moderation analysis extension to BMA meta-analysis [40,41].We performed the BMA in R using the RoBMA R package [46].BMA employs Bayes factors (BFs) and Bayesian model averaging to evaluate the likelihood of the data under the combination of models assuming the presence vs. the absence of the meta-analytic effect, heterogeneity, and moderation [40,41,45].The BF 10 is the ratio of the probability of the data under H 1 over the probability of the data under H 0 .We used the categories proposed by Lee & Wagenmakers for the interpretation of the BFs [47].The evidence in favor of H 1 (BF 10 > 1) was categorized as weak/inconclusive (1 < BF 10 < 3), moderate (3 < BF 10 < 10), strong (10 < BF 10 < 30), very strong (30 < BF 10 < 100), and extreme (BF 10 > 100).The evidence in favor of H0 (BF 10 < 1) was categorized as weak/inconclusive (1/3 < BF 10 < 1), moderate (1/10 < BF 10 < 1/3), strong (1/30 < BF 10 < 1/10), very strong (1/100 < BF 10 < 1/30), and extreme (BF 10 < 1/100).The BF rf is the ratio of the probability of the data under the random effects model over the probability of the data under the fixed effect model and BF mod is the ratio of the probability of the data under the moderated models (i.e., by sex differences) vs. the non-moderated models.Furthermore, BFs for the presence vs. absence of the effect at the different level of the moderator (e.g., BF female , BF male ) were calculated using the Savage-Dickey density ratio [48,49].The categories of strength of the evidence in favor of the random effects (BF rf > 1) or the fixed effect (BF rf < 1), differences by sex (BF mod > 1) or absence of differences by sex (BF mod < 1), and the presence of the effect by sex subgroups (BF female > 1, BF male > 1) or absence of the effect by sex subgroups (BF female < 1, BF male < 1) were similar to those described above for BF 10 .

Results
The NeOProM reported 30 categorical outcomes disaggregated by sex.In addition, we pooled the data of positive pressure (with and without endotracheal tube) and supplemental oxygen at 36 weeks' postmenstrual age (PMA) to obtain an estimate of moderate-to-severe BPD as defined by Jobe & Bancalari [51].The overall and sex-disaggregated BMA results are shown in Tables 1-3.These tables show the analyses with s = 1/4 (i.e., the expected difference in each moderator level corresponding to 1/4 of the mean effect size).Supplementary Tables S1-S3 show the results with s = 1/2 (i.e., the expected difference in each moderator level corresponding to 1/2 of the mean effect size).Supplementary Tables S4-S6 show the original results of the frequentist analysis [11] compared with the results of the Bayesian analysis.Regarding the overall results, the Bayesian analysis showed moderate evidence in favor of H 0 (BF 10 = 0.30) for the main outcome (death or major disability at 18-24 months' age corrected for prematurity) (Table 1, Figure 1).When the two components of this major outcome were analyzed separately, the Bayesian analysis showed moderate evidence in favor of H 1 (BF 10 = 3.60) for mortality (lower in the group exposed to the high SpO 2 range) and moderate evidence in favor of H 0 (BF 10 = 0.21) for major disability (Table 1, Figure 1).The evidence in favor of H 1 was also moderate when mortality was defined before 36 weeks' PMA (BF 10 = 3.33), and before hospital discharge (BF 10 = 3.15).The evidence in favor of H 0 was moderate to inconclusive for the other definitions of major disability used by the investigators (Table 1).age; press: pressure; ROP: retinopathy of prematurity; RR: risk ratio; Suppl.: supplementary; U: upper limit.RR > 1 indicates higher risk with lower SpO2 range (85-89% vs. 91-95%).Of the secondary outcomes related to disability, the Bayesian analysis showed moderate evidence in favor of H 0 for "Bayley-III language and/or cognitive scale < 85" (BF 10 = 0.22) and "Bayley-III language scale < 85" (BF 10 = 0.30) (Table 2).With regard to other secondary outcomes, the Bayesian analysis showed that the evidence in favor of H 1 was very strong for supplemental oxygen at 36 weeks' PMA (BF 10 = 99.49,lower rate in lower SpO 2 group), strong for moderate-to-severe BPD (BF 10 = 14.44, lower rate in lower SpO 2 group), strong for severe NEC (BF 10 = 9.94, lower rate in higher SpO 2 group), and moderate for treated ROP (BF10 = 3.36, lower rate in lower SpO 2 group) (Table 3, Figure 1).In addition, the analysis showed moderate evidence in favor of H 0 for PDA that was medically or surgically treated (BF 10 = 0.17), oxygen at discharge (BF 10 = 0.30), and readmission to hospital (BF 10 = 0.17) (Table 3).
With regard to the results disaggregated by sex, the BMA analysis showed marked differences between the BFs for males and females for two outcomes: supplemental oxygen at 36 weeks' PMA and moderate-to-severe BPD (Table 3, Figure 1).BMA regression showed that the only outcome with moderate evidence for sex differences (BFmod = 3.41) was moderate-to-severe BPD (Table 3, Figure 1).
To evaluate the robustness of the results, an additional analysis was performed with the s-value set to 1/2.As is shown in Supplementary Tables S1-S3, the use of an s-value of 1/2 did not produce substantial changes in the results.

Discussion
The NeOProM collaboration have provided the highest quality evidence on what SpO 2 ranges are most appropriate for extremely preterm infants during the first weeks of life.The main contribution of this Bayesian reanalysis is that it allows an assessment of the strength of this evidence in a way that goes beyond the dichotomous categorization (significant vs. non-significant) of classical frequentist statistics.The Bayesian reanalysis showed that there is moderate evidence in favor of H 0 (BF 10 = 0.30) for the primary outcome of the NeOProM trials (death or major disability).In other words, there is moderate evidence of an absence of difference between the two SpO 2 ranges.This evidence of no difference between the two saturation ranges was confirmed when major disability was analyzed separately from mortality (BF 10 = 0.21).Interestingly, when mortality was examined separately from major disability, the Bayesian analysis showed moderate evidence (BF 10 = 3.60) in favor of lower mortality in the group exposed to the higher SpO 2 range.This confirms the results reported in the frequentist analysis as being "statistically significant" (p = 0.01) [11].Regarding other outcomes, the Bayesian analysis confirmed that exposure to the higher saturation range was associated with a decreased risk of NEC but increased risk of ROP.In addition, the Bayesian analysis showed that the higher SpO 2 range was associated with a higher risk of moderate-to-severe BPD.Finally, when the results were disaggregated by sex, the BMA regression showed moderate evidence of sex differences in the effects of SpO 2 ranges on BPD.
In spite of the careful design of the NeOProM trials, there are physiological, technical, and implementation issues with the methods and interventions used in the RCTs that raise questions about the external validity and practical applicability of the findings [52][53][54][55][56][57].Despite nearly identical protocols with similar pulse oximetry masking for the groups, significant differences in the target were achieved.In both the SUPPORT and COT trials, the median distribution of SpO 2 was higher than the target.The three BOOST II trials were the most successful in achieving the target for both groups in terms of median values [57].In addition, the interpretation of the results was complicated by a revision of the calibration software for the study oximeters [56,57].Furthermore, the two comparison groups may have been more similar than different because of inherent variability in pulse oximeter accuracy, lack of specification of probe placement, and differences in oxygen dissociation curves for fetal and adult hemoglobin [52][53][54][55][56][57].Therefore, it has been argued that the NeOProM studies may not have been able to separate two true areas of oxygen exposure [52][53][54][55][56][57].
The results of the NeOProM trials were not significant for the primary outcome of death or major disability (RR 1.04, 95% CI 0.98 to 1.09, p = 0.21) [11].In the case of nonsignificant or null results, clinicians need to be able to gauge the evidence of the absence of an effect [44].However, the frequentist approach does not allow us to distinguish whether the null results indicate evidence for the absence of differences between the two saturation ranges or whether they are inconclusive (i.e., absence of evidence).Here, we have shown how this goal of distinguishing between the two situations can be achieved by using BFs.The BF 10 for the outcome of death or major disability in the NeOProM trials was 0.30.Consequently, the BF 01 was 3.33 (1/0.3 = 3.33).This means that the data are 3.33 times more likely under H 0 than under H 1 , which is considered moderate evidence in favor of H0 (no differences between the two saturation ranges).
Despite the limitations mentioned above, the RCTs included in the NeOProM collaboration showed differences between the two saturation ranges in several key outcomes, including mortality.The SUPPORT trial, the first of the NeOProM collaboration to report in-hospital outcomes, reported no difference in the composite primary outcome of death or ROP, but showed evidence that targeting 2 in the lower range (85% to 89%) was associated with an unanticipated higher mortality rate (RR 1.27; 95% CI, 1.01 to 1.60; p = 0.04) [13].A subsequent safety meta-analysis of the SUPPORT trial along with the three BOOST II trials reported significantly lower mortality in the higher-target group (91% to 95%).As a result, enrollment in two of the BOOST II trials was stopped early because further enrollment could cause harm to participants [58].Finally, the NeOProM confirmed the higher mortality associated with the lower SpO 2 range (RR 1.17, 95% CI 1.04 to 1.31, p = 0.01) [11].The present Bayesian reanalysis showed that the evidence for this finding was moderate (BF 10 = 3.60).In addition, the Bayesian analysis showed that the evidence was moderate to strong for increased rates of severe NEC but lower rates of severe ROP and moderate-to-severe BPD in the group exposed to the lower SpO 2 range.Differences in BPD are difficult to interpret because the definition of BPD is based on the need for oxygen and/or respiratory support [51].It is plausible that if the target saturation is higher, there is a greater likelihood that oxygen will be required to reach that target.Interestingly, the Bayesian analysis showed moderate evidence of no difference between the two saturation ranges when the outcome was mechanical ventilation.This suggests that the development of the more severe forms of lung damage would not be affected by the target SpO 2 range.
Regarding other complications, both NEC and ROP are two conditions that neonatologists strive to prevent because they have a major impact on the outcome of prematurity.The fact that one is associated with the low SpO 2 range and the other with the high SpO 2 range raises the clinical dilemma of accepting higher ROP rates to reduce both mortality and NEC rates.A growing number of observational studies have reported an increase in the rate of severe ROP in association with the introduction of higher SpO 2 ranges [59][60][61][62].However, other investigators have not confirmed this increase in ROP [63,64].Interestingly, neither the differences in ROP nor the differences in NEC ultimately had an effect on the neurodevelopment of the infants in the NeOProM studies.As mentioned above, Bayesian analysis showed moderate evidence in favor of H 0 for the major disability outcome.In addition, despite the higher rate of ROP in the group exposed to the high SpO 2 range, the Bayesian analysis showed inconclusive evidence in favor of H 0 (BF 10 = 0.83) for the outcome of visual impairment at 18 to 24 months of age.
The underrepresentation of female participants in adult RCTs is a growing concern because low inclusion rates of women may create a lack of crucial knowledge of the adverse effects and the benefit/risk profile of any given treatment [65].In the case of RCTs conducted in the neonatal population, it appears very unlikely that an imbalance in the inclusion of one of the sexes may occur, but it should be noted that the baseline risk of morbidity and mortality is different for males and females [18][19][20].Therefore, reporting sex-stratified outcomes for both efficacy and adverse events is of high importance [18,21].When we analyzed the potential sex differences in the various outcomes of the NeOProM studies, we found that there were marked differences between males and females in the strength of evidence for moderate-to-severe BPD, and oxygen requirement at 36 weeks' PMA (Table 3).However, just as it would be wrong to conclude that the presence of a statistically significant (p < 0.05) association for males combined with no significance (p > 0.05) for females implies that there is a sex difference [66], we cannot conclude that the presence of evidence supporting H 1 for one sex and H 0 for the other is evidence in favor of a difference [67].When we tested, using BMA-regression, the possible interaction of biological sex with the different outcomes, we found that the evidence in favor of H 1 was moderate for the outcome of moderate-to-severe BPD, but inconclusive or in favor of H 0 (absence of sex differences) for the rest of the outcomes.
The NeOProM project is a major achievement and a milestone in international neonatal research collaboration and has had a profound impact on clinical practice.The potential association of the lower SpO 2 range (85-89%) with increased mortality and development of NEC led many NICUs worldwide to implement saturation ranges close to the high ranges studied in the trials (91-95%), as recommended by scientific panels and organizations [68][69][70].However, it should be noted that preterm infants probably have individually different susceptibility to the damage caused by either hypoxia or hyperoxia [52][53][54][55][56][57].Factors such as perinatal and neonatal comorbidity, gestational and postnatal age, growth, or therapeutic interventions may have an impact on the severity and extent of hypoxic or oxidative stress.Therefore, is unlikely that a single narrow SpO 2 range can be found that would be safe for all extremely preterm infants [52][53][54][55][56][57].Nevertheless, it does not appear that the efforts of the neonatal research community will be directed towards conducting new RCTs on SpO 2 limits.In a search of https://clinicaltrials.gov/ (accessed on 10 January 2024), we could not find any ongoing RCT focusing on SpO 2 limits in extremely preterm infants outside of the birth resuscitation period.
In conclusion, the present Bayesian reanalysis of the NeOProM trials confirmed that there is moderate evidence that exposure to a SpO 2 range of 85-89% versus a range of 91-95% is associated with a higher mortality rate in extremely preterm infants.There is strong evidence that the higher SpO 2 range is associated with a lower rate of severe NEC and moderate evidence that it is associated with a higher rate of severe ROP.Finally, Bayesian reanalysis showed strong evidence for an association between a higher SpO 2 range and BPD.This association was more apparent in females than in males, suggesting the presence of sex differences in pulmonary susceptibility to oxygen supplementation in extremely preterm infants.The Bayesian approach may provide a new perspective on the scientific evidence from RCTs and meta-analyses, and can help in assessing the strength of evidence that supports clinical decisions.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/antiox13050509/s1,Table S1: Bayesian-model-averaged (BMA) regression of the outcome death and/or major disability in the NeOProM trials (analyses with s = 1/2); Table S2: Bayesian-model-averaged (BMA) regression of the outcomes related to neurodevelopmental impairment in the NeOProM trials (analyses with s = 1/2); Table S3: Bayesian-model-averaged (BMA) regression of secondary outcomes in the NeOProM trials (analyses with s = 1/2).Table S4: Comparison between the Bayes Factor (BF) values of the Bayesian-model-averaged (BMA) analysis and the p-values of the original NeOProM frequentist analysis for the outcomes of death and/or major disability; Table S5: Comparison between the Bayes Factor (BF) values of the Bayesian-modelaveraged (BMA) analysis and the p-values of the original NeOProM frequentist analysis for the outcomes related to neurodevelopmental impairment; Table S6: Comparison between the Bayes Factor (BF) values of the Bayesian-model-averaged (BMA) analysis and the p-values of the original NeOProM frequentist analysis for the secondary outcomes.
Author Contributions: E.V. conceived and designed the study, with input from the other authors.M.J.H. and T.M.H. screened and reviewed the search results, and abstracted the data.E.V. checked data extraction for accuracy and completeness.F.B. designed and conducted the Bayesian analysis, with input from the other authors.All authors contributed to the interpretation of analyses.M.J.H. and E.V. constructed the figures and tables.E.V. and F.B. drafted the manuscript, with input from the other authors.All authors reviewed the manuscript and provided important intellectual content.E.V. and M.J.H. take responsibility for the article as a whole.All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding, nor any specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Institutional Review Board Statement: As this meta-analysis did not involve animal subjects or personally identifiable information on human subjects, ethics review board approval was not required.

Informed Consent Statement:
As this systematic meta-analysis did not involve animal subjects or personally identifiable information on human subjects, patient consent was not required.

Figure 1 .
Figure 1.Bayesian reanalysis of the results of the NeOProM study.(a) Summary of the overall and sex-disaggregated results.RR > 1 indicates higher risk with lower SpO2 range (85-89% vs. 91-95%); (b) Summary of Bayes factors (BFs) calculated through Bayesian-model-averaged (BMA) meta-regression.The BF10 is shown for the overall results and the BFFemale and BFMale for the results disaggregated by sex; BPD: bronchopulmonary dysplasia; NEC: necrotizing enterocolitis; ROP: retinopathy of prematurity.

Figure 1 .
Figure 1.Bayesian reanalysis of the results of the NeOProM study.(a) Summary of the overall and sex-disaggregated results.RR > 1 indicates higher risk with lower SpO2 range (85-89% vs. 91-95%); (b) Summary of Bayes factors (BFs) calculated through Bayesian-model-averaged (BMA) meta-regression.The BF10 is shown for the overall results and the BFFemale and BFMale for the results disaggregated by sex; BPD: bronchopulmonary dysplasia; NEC: necrotizing enterocolitis; ROP: retinopathy of prematurity.

Table 1 .
Bayesian-model-averaged (BMA) regression of the outcome death and/or major disability in the NeOProM trials.

Table 2 .
Bayesian-model-averaged (BMA) regression of the outcomes related to neurodevelopmental impairment in the NeOProM trials.

Table 3 .
Bayesian-model-averaged (BMA) regression of secondary outcomes in the NeOProM trials.